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ABSTRACT 

In this work we consider time series with a finite number of discrete point changes. We assume 
that the data in each segment follows a different probability density functions (pdf). We focus 
on the case where the data in all segments are modeled by Gaussian probability density functions 
with different means, variances and correlation lengths. We put a prior law on the change point 
instances (Poisson process) as well as on these different parameters(conjugate priors) and give 
the expression of the posterior probality distributions of these change points. The computations 
are done by using an appropriate Markov Chain Monte Carlo (MCMC) technique. 

The problem as we stated can also be considered as an unsupervised classification and/or 
segmentation of the time serie. This analogy gives us the possibility to propose alternative mod- 
eling and computation of change points, which are more appropriate for multivariate signals, 
for example in image processing. 
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1. INTRODUCTION 

Figure 1 shows typical change point problems we consider in this work. Note that, very often 
people consider problems in which there is only one change point. ^ Here we propose to consider 
more general problems with any number of change points. However, very often the change point 
analysis problems need online or real time detection algorithms,^"^ while here, we focus only on 
off line methods where we assume that we have gathered all the data and we want to analyse 
it to detect change points who have been occured during the observation time. Also, even if we 
consider here change point estimation of 1-D time series, we can extend the proposed method to 
multivariate data, for example the images where the change point problems become equivalent 
to segmentation. One more point to position this work is that, very often the models used in 
change point problems assume to know perfectly the model of the signal in each segment, i.e., a 
linear or nonlinear regression model, while here, we use a probabilistic model for the signals 
in each segment which gives probably more generality and applicability when we do not know 
perfectly those models. 
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Figure 1. Change point problems description: In the first row, only mean values of the different 
segments are different. In the second row, only variances are changed. In the third row only the 
correlation strengths are changed. In the fifth row, the whole nature shape of their probability 
distribution have been changed. The last row show the change points 

More specifically, we model the time series by a hierarchical Gauss-Markov modeling with 
hidden varaibles which are themselves modeled by a Markov model. Though, in each segment 
which corresponds to a particular value of the hidden variable, the time series is assumed to 
be modeled by a stationnary Gauss-Markov model. However, we choosed a simple parametric 
model defined only with three parameters of mean fi, variance cr^ = 1/r and a parameter p 
measuring the local correlation strength of the neighboring samples. 

The choice of the hidden variable is also important. We have studied three different modeling: 
i) change point time instants t^, ii) classification labels Zn or iii) a Bernouilli variable g„ which 
is always equal to zero except when a change point occurs. 

The rest of the paper is organized as follows: In the next section we introduce the notations 
and fixe the objectives of the paper. In section 3 we consider the model with explicite change 
point times as the hidden variables and propose particular modeling for them and an MCMC 
algorithm to compute their a posteriori probabilities. In sections 4 and 5 we consider the 
two other aformentionned models. Finally, we show some simulation results and present our 
conclusions and perspectives. 



2. NOTATIONS AND MODELING 



We note hy x — [x{to), • • • , x{to + T)]' the vector containing the data observed from time to to 
^0 + T. We note hj t = [ti, - ■ ■ , tjv]' the unknown change points and note x = [xq, Xi, • • • , x^]' 
where Xn = [x(tn), x{tn + 1), ■ ■ ■ , x{tn+i)]\ n = 0, ■ ■ ■ , N represent the data samples in each 
segment. In the following we will have t^^i = T. 

Wc model the data Xn = [x{tn),x{tn + !),••• , x{tn+i)]', n = 0, - ■ ■ ,N in each segment by 
a Gauss-Markov chain: 



p{x{tn)) = J\f{nn,(Tl) 

p{x{tn + l)\x{tn + l-l)) = Af{pnX{tn + l-l) + {l-pn)^ln,(Tl{l-pl)), Z = 1, • " " , - 1 
with In = tn+1 -tn + 1^ dim [Xn] (l) 

Then we have 

In 

p(Xn) = p(x(tn))Y[p(x(tn + l)\x{tn + l -1)) 
1=1 

p{Xn) OC exp I -^{x{tn) " Pn)'^ 
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p{Xn) = E„) with S„ = (7^ Toeplitz([l, p„, , . . . , pjn]) (2) 

Noting by t = [ti, ■ ■ ■ ,tAr] the vector of the change points and assuming that the samples 
from any two segments are independent, we can write: 

isj-v2\ r 1 ^ 



p{x\t, 0, N) = llj\f {Pnl: S„) = m exp <^ J2i^n - PnlYK^^n - /^nl) 

n=0 \n=0 ^ ' J I n=0 



(3) 



where we noted 6 — {pn: cr-n: Pn: n = 0, ■ ■ ■ , N}. 

Note that 

N ^ N ^ N 

- \np{x\t, e, N) = ln(27r) + - ^ In |S„| - - - PnlYK'i^n - Pnl) (4) 
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and when the data are i.i.d., (S„ = Cn-f) this becomes 



- lnp(a.|*, e, N) = (T/2) ln(27r) + f^iL^ Ina^ - J] J"^^" (5) 

n=0 n=0 " 



Then, the inference problems we will be faced are the following: 



1. Infer on given x and t; 



2. Infer on t given x and 0; 

3. Infer on t and given x; 

4. Infer on given a;. 

5. Infer on t given x; 

It is clear that the first problem is the easiest. 

The classical maximum likelihood estimation (MLE) approach can handle only the first three 
problems by maximizing p{x\t, 0), respectively, with respect to 0, to t and jointly (t, 0): 

• Estimating given x and t: = argmaxg {p(x\t, 0)} 

• Estimating t given x and 0: t — argmax* {p{x\t, 0)} 

• Estimating t and given x: {t, 0) = argmax(t^0) {p{x\t, 0)} 

However, we must be careful to check the boundedness of the likelihood function before using 
any optimization algorithm. The optimization with respect to when t is known can be done 
easily, but the optimization with respect to t is very hard and computationally costly. 

The two last problems cannot be handled easily because they need to define the likelihood 
fuctions p{x\0) and p{x\t) which need integrations with respect to t or of p{x\t, 0). There 
may not be possible to find analytical expressions for these integrals which may even not exist. 



3. BAYESIAN ESTIMATION OF THE CHANGE POINT TIME 



In Bayesian approach, one assigns prior probability laws on both t and and use the posterior 
probability law p{t, 0\x) as a tool for doing any inference. Choosing a prior pdf for t is also 
usual in classical approach. A simple model is the following: 



where are assumed iid end A is the a priori mean value of time intervals (i„ — tn-i). if N is 
the number of changepoint we can take A = jTrr- With this modeling we have : 



INSTANTS 



tn = tn-l + tn with 



en ~ V{X), 



(6) 




(7) 



With this prior selection, we have 



p{x, t\0, N) = p{x\t, 0, N)p{t\\, N) 



(8) 



and 



p{t\x, e, N) oc p{x\t, e, N) p{t\X, N) 



(9) 



In Bayesian approach, one goes one step further with assigning prior probabihty laws to the 
hyperparameters 0, i.e., p{0) and then one writes the joint a posteriori: 



p{t, d\x, A, N) (X p{x\t, e, N) p{t\X, N) p{e\N) (10) 
where here we noted — a^, Pn, n — 1, ■ • ■ , N}. 

To go further in details, we need to assign p{6). The following is our selection: 

p{^n) = A/'(/io,0-o) 

= U{[0,1]) 

which correspond mainely to the conjugate or reference priors. 

Given all these, we propose the following Gibbs MCMC algorithm: 

Iterate until convergency 

. sample t using p{t\x,d, N) 

. sample 9n '■ 

Hn using p{iin\x,t,N) 

(7^ using p{a'^\^x,t, N) 

Pn using p{pn\x,t,N) 



3.1. Sampling t using p{t\x^ 6, N) 

P. Fcarnhead showed^*^ that it is possible to perform perfect simulation of p{t\x,6,N) when 
wc have assumed that segments of data separated by a changepoint tn are independant. This 
simulation can be obtained by a method based on recursion on the changepoints. An approxi- 
mation of this method is possible to obtain an algorithm whose computational cost is linear in 
the number of observations. The main principle of this algorithm is to compute the following 
probabilities : 

Let note Xt-.s = [x{t),x{t + 1), . . . , x{s)], and 

R{t. s\X) = p{xt;s\t, s in the same segment. A) 
Q{t\X) — p (a;t:5 1 changepoint at i — 1, A), Q{1) — p{x\X) 

Let also note F{t\X) the associated cumulative distribution function of the prior density — 

t„_i|A) which is defined by (7). 

We compute R{t, s\X) with the following relation : 



R{t,s)\x) = / p{xt:s\d,x)p{d)de 



The computation of Q{t\X) can be done recursively by the following result : for i = 1, . . . , T, 



T-l 



Q{t\X) = J2 s\X)Q{s + l\X)V(s + 1 - i|A) + R(t, T\X)(1 - F(T - t\X)), 



s=t 



This result is shown by P. Fearnhead^'^ . And he also demonstrates that the posterior distribution 
of tn given tn-i is 



p{tn\tn-l,X,X) = 



R{tn-1, tn\X)Q{tn + l|A)P(t„ - t„_i|A) 



Q{tn-l\X) 

and the posterior distribution of no further changepoint is given by 

R{tn-i,T\X)il-F{T-tn-i-l\X)) 



p{tn = T\tn-l,X,X) = 

3.2. Sampling 9n using p(On\x,t, N) 

We may note that, thanks to the conjugacy, we have: 



Q{tn-l\X) 



P{l^n\x,t) = NiV^n.'^l) with 



p{al\x,t) = ig{an,Pn) with 



a„ = ao + Y 

3n = /?0 + \{Xn - fln^yR-^iXn - 



where Rn — Toephtz([l, p^, • • • , Pn ])• Then the simulation of these densities is quite simple. 



p{pn\x,t) is not a classical law. Its expression is given by : 

N 

P{pn\x,t,N) = YlpiPn\Xn,t,N) 

n=0 

In 

" i^ST^) ' -p{-^;|(i^K-/'"i)X-'(-»-M»i)} 

/ 1 \ ^ f 1 

" -p}-s;i(r3^EW'» + o-P»x(t„ + i-i)-(i-p„ 

Then we can not sample easily this density. 

The solution we propose is to use, in this step, a Hastings-Metropolis algorithm for sampling 
this density. As an instrumental density we propose to use a Gaussian approximation of the 
posterior density, i.e., we estimate the mean m^^ and the variance cXp^ of p(p„|cc, t, A^) and 
we use a Gaussian law A/'(mp„ , cr^^ ) to obtain a sample. This sample is accepted or rejected 



following p{pn\x,t, N). In practice we compute m^^ and a^^ calculating by approximation 
their definition : 

mp„ — ' Pn p{p„\x,t,N) 



— ' [ pI P{Pn\x,t,N) -ml^ 
Jo 



4. OTHER FORMULATIONS 



Other formulation can also exist. We introduce two sets of hidden variables 
z = [z(io), • • • , -^(^0 + 7^)]' and q = [?(to), • • • , q{t^ + T)]' 

where 

r 1 \iz{t)i^z{t-\) ^ r 1 if t = t„,n-0,--- ,7V 
^ ' 1^ elsewhere |^ elsewhere ' 

and where z(t) takes an integer value k in each segment : A; = 1,...,A^ + 1. With these two 
related hidden variables, we can propose two other modeling to be used in change point analysis. 
For example, q can be modeled by a Bernouilli process 

P(Q = q) = A^^*(l - A)^/^-^^) = A^^*(l - Xf-^i'ii 

and z can be modeled by a Mrkov chain, ie., {^(i), i = 1, • • • , T} forms a Markov chain: 

P[z{t) = -k)=VH. k = l,---,K, 

P{z{t)^k\z{t-l)^l)^Pku with EfePfc/ = l- ^ ' 

These two models are related. In the first one, A plays the role of the mean value of the segment 
lengths and in the second Pk and p^i give more precise control of the segment lengths. In the 
multivariate case, or more precisely in bivariate case (image processing), q may represent the 
contours and z the labels for the regions in the image. Then, we may also give a Markov model 
for them. For example, if we note hy r E S the position of a pixel, S the set of pixels positions 
and by V(r) the set of pixels in the neighorhood of the pixel position r, we may use an Ising 
model for q 

P{Q = qr) oc exp I -p J] J] 6{z{r) - z{s)) I (13) 

[ res seV(r) J 

or a Potts model for z: 

P{z) oc exp 1 -p 5] 5] 6{z{r) - z{s)) i . (14) 

[ res sGV(r) J 

where rho in the first controls the mean lengths of the contours in the image and in the second 
the mean surface of the regions in the image. Other more complexe modelings are also possible. 

With these auxiliary variables, we can write 

N N 

Zj = n)M{iin'^, S„) = J^pfejV(p„l, S„) (15) 

n=l n=l 

if we choose K = N . Here, = {N, an,Pn, n — 1, - ■ ■ , A^} , {pki, k,l — 1, - ■ ■ , N)} and the 
model is a mixture of Gaussians. 

We can again assign appropriate prior law on and give the expression of p(z, 6\x) and do 
any inference on z, 0. 



Finally, we can also use q as the auxiliary variable and write 

p{x\q,e) = (27r)-^/2(|nV^")^^p|-^E(^(^")-/^")'} 

+ (2vr)-(^-^)/^ (^fl l/ai'"-) j exp f^il - (x, - x,.,y 

(16) 

and again assign appropriate prior law on 6 and give the expression of p{q,6\x) and do any 
inference on q, 6. Wc are still working on using these auxiliary hidden variables particularly for 
applications in data fusion in image processing and we will report on these works very soon. 



5. SIMULATION RESULTS 

To test the feasability and to mesaure the performances of the proposed algorithms, we generated 
a few simple cases corresponding to only changes of one of the three parameters cr^ and 
In each case we present the data, the histogram of the a posteriori samples of t during the first 
and the last iterations of the MCMC algorithm. For each case we also give the vahie of the 
parameters used to simulate the data, the estimated values when the changepoints are known 
and the estimated values by the proposed method. 



5.1. Change of the means 

We can see in figure 2 that we obtain precise results on the position of the changepoints. In 

the case of change of means, the algorithm is very fast to converge to the good solution. In fact 
it needs only few iterations (about 5). The main cause of this results is the importance of the 
means in the likelihood p{x\t, d, N). 

We can also see in table 1 that the estimations of the means are very precise, particularly when 
the size of the segment is long. 
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Figure 2. Change in the means, up to down : simulated data, histogram in the 50th iteration, 
histogram in the first iteration, real position of the changepoints. 
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Table 1. Estimated value of the means 



5.2. Change in the variances 

We can see in figure 3 that we have again good results on the position of the changepoints. 
However, for httle difference of variances, the algorithm give an uncertainty on the exact position 
of the changepoint. This can be justified by the fact that the simulated data give itself this 
uncertainty. 

In table 2 we can see again good estimations on the variances on each segments. 
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Figure 3. Change in the variances, up to down : simulated data, histogram in the 50th 
iteration, histogram in the first iteration, real position of the changepoints. 
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Table 2. Estimated value of the variances 



5.3. Change in the correlation coefficient 

The results showed in figure 4 are worse than in the two first cases. The position of the 
changepoints are less precise, and we can see that another changepoint appears. This affects the 
estimation of the correlation coefficient in the third segment because the algorithm alternates 
between two positions of changepoint. This problem can be justified by the fact that a value of 
the correlation coefficient near 1 implies locally a change of the mean, which can be considered 
by the algorithm as a changepoint. Also this problem appears when the size of the segments 
are far from the a priori size A. 
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Figure 4. Change in the correlation coefficient, up to down : simulated data, histogram in the 
50th iteration, histogram in the first iteration, real position of the changepoints. 
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Table 3. Estimated vaue of the correlation coefficients 



5.4. Influence of the prior law 

In this section we study the influence of the a priori on A, i.e., the size of the segments. In the 
following we fix the number of changepoints as before and we change the a priori size of the 
segments by Aq = f and Ai = 2A. We apply then our algorithm on the change of the correlation 
coefficient. 
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Figure 5. Different correlation coefficient with Aq = ^j^- up to down : simulated data, 
histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints. 

In figure 5, we can see that the algorithm has detected other changepoints, forming segments 
whose size is near Aq. This result shows the importance of the a priori when the data are not 
enough significant. We can also see this conclusion in figure 6 where only three changepoints 
are detected, forming segments whose size is again near Ai. We can also remark that fixing a 
priori a size A comes down to fix the number of changepoints. Our algorithm give then good 
results for instance if we have a good a priori on the number of changepoints. 
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